Advances in experimental and theoretical work increasingly suggest that parasite interactions within a single host can affect the spread and severity of wildlife diseases. Yet empirical data to support predicted co-infection patterns are limited due to the practical challenges of gathering convincing data from animal populations and the stochastic nature of parasite transmission.
Here, we investigated co-infection patterns between micro- (bacteria and protozoa) and macroparasites (gastrointestinal helminths) in natural populations of the multimammate mouse (Mastomys natalensis).
We hypothesize that:
the host’s gastrointestinal helminth community has a significant effect on the presence of these microparasites due to immunomodulation.
Highly explorative individuals are more likely to be infected with microparasites compared to less explorative individuals
Mastomys natalensis
First, we need to setup R. HMSC models generally generate a lot of output (models, panels, etc). It is therefore easier to put this in different maps in order to avoid chaos.
So, let’s create 3 different folders to store models, data and panels
localDir = "."
ModelDir = file.path(localDir, "models")
DataDir = file.path(localDir, "data")
Now we can read in the data.
SXY = read.csv2("BartPaper.csv", stringsAsFactors=TRUE,sep=",")
The file that we’ll load is the full dataset.
It consists generally out of three parts:
Let’s first look at the data
## Individual Trap_Site
## 1 TZ34866 co-3
## 2 TZ34867 co-3
## 3 TZ34868 co-3
## 4 TZ34869 co-3
The study design consists out of:
We Trapped the rodents in Morogoro, Tanzania on 11 different sites.
The mean distance between every plot is 992m (min = 76m, max = 2073m).
The average dispersion distance of M. natalensis is 300m, while the average home range sizes are relatively small (±650m² using capture-mark recapture data and ±1200m² using radio tracking). Most plots are spaced more than 300m from each other and are therefore independent of each other, except for plots CO1, CO3 and CO10 which are indeed very close to each other (marked in red on the table). In these plots, we are probably sampling the same population. We therefore decided to consider these three locations as one.
The number of trapped individuals per site are:
##
## co-11 co-2 co-3 co-4 co-5 co-6 co-7 co-8 co-9
## 2 3 88 13 26 24 3 41 11
We will use the following variables as fixed effects in the model:
Let’s look at all of them separeltly.
##
## F M
## 136 75
Fewer males than females. Still the dataset looks sufficient.
Looks good.
An important driver of parasitic infection probability is the individuals’ exposure time to the parasite. The longer you are exposed, the more likely you are to eventually become infected.
This can be estimated either using the individuals reproductive age (adult or juveniles; adults are assumed to have a longer exposure time than juveniles) or by using Eye Lens weight which is an unbiased proxy for age in small mammals, and thus consequently, exposure time.
We’ll first look at eye lens weight
We should note that were not able to measure the eye lens weight of two individuals. We therefore decided to give them the mean value, and therefore will not influence the results.
SXY$EyeLens <- as.numeric(as.character(SXY$EyeLens))
A<-SXY$EyeLens[1:30]
B<-SXY$EyeLens[33:211]
Q<-c(A,B)
SXY$EyeLens[31]<-mean(Q)
SXY$EyeLens[32]<-mean(Q)
From this graph, it is clear that there are two big age categories. A large proportion have an eye lens weight of less than 20mg, while there is a group with an eye lens weight of more than 25mg.
Let’s transfer this to the actual age, based on the the following
formula:
\[
w = -10.46088 + (4.35076 * ln(a))
\]
with:
If we transform that, it will be:
\[ a = e^{ \frac{10.46088 + w }{4.35076} } \]
So let’s calculate this for each ID and plot it
SXY$AgeDays <- exp((10.46088+(SXY$EyeLens/2))/4.35076)
From here, it is clear that a large group is less than 100 days old,
while others are almost one year old.
This means there are two different age groups: those born in the year
when we conducted our study, and others which were borne during the
previous breeding seasons, the year before.
THere are now multiple ways to include this into the model:
The first option will lead to a large bias, since the the small proportion of animals that were born in the previous breeding season will have an extreme large effect on the correlation.
Grouping them in different categories is therefore more sensible. We will divide them in the following groups:
Let’s make these groups in the dataset
SXY$Age2 <-c(1:nrow(SXY)) # MAke a new column first
for (i in 1:nrow(SXY)){
if (SXY$AgeDays[i]<50){
SXY$Age2[i] <- 1
} else {
SXY$Age2[i] <- 2
}
}
for (i in 1:nrow(SXY)){
if (SXY$AgeDays[i]>200){
SXY$Age2[i] <- 3
}
}
SXY$Age2<-as.factor(SXY$Age2)
table(SXY$Age2)
##
## 1 2 3
## 100 88 23
Now let’s take a look at reproductive age.
We see that juveniles are indeed a lot younger than adults (except
three individuals, which we will investigate later on). This also
confirms that the assumptions in Vanden Broecke et al. (2021) were
correct.
Additionally, the two individuals without eye lens weight were also
considered to be adults (the two red points), which is also reassuring
that they will have little effect on the dataset.
Now, it is clear that we cannot run a model with both Reproductive age and age in days as fixed effects. We will therefore have two seprate models where we either have reproductive age as a fixed effect, or age (in days, based on the three categories we defined previously).
Oke, done.
We have several parasites that we will include in our model:
First, we’ll look at the helminths
A quick view, we see that that there is some correlation between the
different infections.
For instance, individuals infected with Protospirura are also more
likely to be infected with Dava and Trichu.
The number of infected individuals and the corresponding prevalence is shown in the table below
| Names | InfectedID | Prevalence |
|---|---|---|
| Hymenolepis | 96 | 0.4549763 |
| Davaineida | 59 | 0.2796209 |
| Protospirura | 74 | 0.3507109 |
| Trichostrongylidae | 30 | 0.1421801 |
| Trichuris | 30 | 0.1421801 |
| Syphacia | 45 | 0.2132701 |
For Bartonella and Anaplasma, things are a bit more
complicated.
Some samples were uncertain (i.e., individuals that were repeatedly
positive on the qPCR but negative for Sanger sequencing on the
conventional PCR), we will therefore run two different models where we
considered all uncertain samples either as negative or positive (further
referred to as, respectively, the uncertain-negative and
uncertain-positive model).
First, we will need to account for this. We’ll add a column where we either consider the uncertains as positive or negative, both for Bartonella and for Anaplasma
# Bartonella
#### Uncertain --> POSITIVE
SXY$bartPOS <- as.numeric(SXY$bart)
for (i in 1:nrow(SXY)){
if(SXY$bart[i] == "neg"){SXY$bartPOS[i] <- 0}
if(SXY$bart[i] == "pos"){SXY$bartPOS[i] <- 1}
if(SXY$bart[i] == "uncertain"){SXY$bartPOS[i] <- 1}} # Uncertains are now positive
#### Uncertain --> Negative
SXY$bartNEG <- as.numeric(SXY$bart)
for (i in 1:nrow(SXY)){
if(SXY$bart[i] == "neg"){SXY$bartNEG[i] <- 0}
if(SXY$bart[i] == "pos"){SXY$bartNEG[i] <- 1}
if(SXY$bart[i] == "uncertain"){SXY$bartNEG[i] <- 0}} # Uncertains are now Negative
# Anaplasma
#### Uncertain --> POSITIVE
SXY$anaPOS <- as.numeric(SXY$ana)
for (i in 1:nrow(SXY)){
if(SXY$ana[i] == "neg"){SXY$anaPOS[i] <- 0}
if(SXY$ana[i] == "pos"){SXY$anaPOS[i] <- 1}
if(SXY$ana[i] == "uncertain"){SXY$anaPOS[i] <- 1}} # Uncertains are now positive
#### Uncertain --> Negative
SXY$anaNEG <- as.numeric(SXY$ana)
for (i in 1:nrow(SXY)){
if(SXY$ana[i] == "neg"){SXY$anaNEG[i] <- 0}
if(SXY$ana[i] == "pos"){SXY$anaNEG[i] <- 1}
if(SXY$ana[i] == "uncertain"){SXY$anaNEG[i] <- 0}} # Uncertains are now positive
| Names | Infected_ID_POSITIVE | Infected_ID_NEGATIVE | Prevalence_POSITIVE | Prevalence_NEGATIVE |
|---|---|---|---|---|
| Bartonella | 22 | 15 | 0.1042654 | 0.0710900 |
| Anaplasma | 40 | 31 | 0.1895735 | 0.1469194 |
In order to continue with the HMSC models, we need to split the S, X and Y data from each other and store it as a new R file.
For the positive models this will look like this
# S: study design
S=SXY[,1:2]
S[1:4,]
## Individual Trap_Site
## 1 TZ34866 co-3
## 2 TZ34867 co-3
## 3 TZ34868 co-3
## 4 TZ34869 co-3
# X: covariates to be used as predictors
X=SXY[,c(3,4,10,11,26)]
X[1:4,]
## Sex Age Exploration Stress_Sens Age2
## 1 F J -1.5465289 1.11937868 1
## 2 M A 0.9608683 1.30348080 3
## 3 F A 0.2064169 -0.28306460 3
## 4 M J 0.4458350 0.03908914 1
# Y: species data
Y=SXY[,c(12:17,33,35)]
Y[1:4,]
## Hymenolepis Davaineidae Protospirura Trichostrongylidae Trichuris Syphacia
## 1 0 0 0 2 0 0
## 2 1 3 2 72 6 0
## 3 0 2 41 0 2 0
## 4 0 0 0 1 0 6
## bartPOS anaPOS
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
# In order not to change the trait file, I will change the bartonella and anaplasma name
names(Y)<- c("Hymenolepis","Davaineidae","Protospirura","Trichostrongylidae","Trichuris","Syphacia" ,"bart2","ana2" )
Now, we will include the trait file. This basically groups the Y depending on the transmission mode (direct, indirect or vector)
# Will you provide and file with traits (T) or a Phylogeny?
is.TP = TRUE # traits will be included
is.P = FALSE # Phylogeny will NOT be included
# READING IN TP: traits (T) and/or phylogenetic information in table format (P)
if(is.TP){
# Read in the species names as rownames, not as a column of the matrix
TP = read.csv("TP3.csv", stringsAsFactors=TRUE,header=T,row.names = 1)
# The script below checks if the species names in TP are identical and in the same order as in Y
# If the script prints "species names in TP and SXY match", you are ok.
# If it says that they do not match, you need to modify the files so that they match
if(all(rownames(TP)==colnames(Y))) {
print("species names in TP and SXY match")
} else{
print("species names in TP and SXY do not match")
}
# Modify the next two lines to split your TP file to components that relate to
# Tr: species traits (note that T is a reserved word in R and that's why we use Tr)
# P: phylogenetic information given by taxonomical levels, e.g. order, family, genus, species
# If you don't have trait data, indicate this by Tr=NULL.
# If TP does not have phylogenetic data (because you don't have such data at all, or because
# it is given in tree-format, like is the case in this example), indicate this with P=NULL
Tr = TP[,1:1]
P = NULL
# Check that the data looks as it should!
View(Tr)
# Check that the Tr data do not have missing values (they are allowed for Y but not S, X, P or Tr)
if (any(is.na(Tr))) {
print("Tr has NA values - not allowed for")
} else {
print("Tr looks ok") }
# Check that the phylogenetic/taxonomic data do not have missing values (they are allowed for Y but not S, X, P or Tr)
if (any(is.na(P))) {
print("P has NA values - not allowed for")
} else {
print("P looks ok") }
}
## [1] "species names in TP and SXY match"
## [1] "Tr looks ok"
## [1] "P looks ok"
Tr = data.frame(TP)
names(Tr)= c("Trans_Mode","Path")
Tr
## Trans_Mode Path
## Hymenolepis Indirect Tape
## Davaineidae Indirect Tape
## Protospirura Indirect Nema
## Trichostrongylidae Direct Nema
## Trichuris Direct Nema
## Syphacia Direct Nema
## bart2 Vector ZBact
## ana2 Vector ZBact
Now let’s save the data!
# Save the data!
Y = as.matrix(Y)
save(S,X,Y,Tr,file="allDataPOS.R")
We need to do the same for the negative model obviously.
# Y: species data
Y=SXY[,c(12:17,34,36)]
# In order not to change the trait file, I will change the bartonella and anaplasma name
names(Y)<- c("Hymenolepis","Davaineidae","Protospirura","Trichostrongylidae","Trichuris","Syphacia" ,"bart2","ana2" )
# Will you provide and file with traits (T) or a Phylogeny?
is.TP = TRUE # traits will be included
is.P = FALSE # Phylogeny will NOT be included
# READING IN TP: traits (T) and/or phylogenetic information in table format (P)
if(is.TP){
# Read in the species names as rownames, not as a column of the matrix
TP = read.csv("TP3.csv", stringsAsFactors=TRUE,header=T,row.names = 1)
# The script below checks if the species names in TP are identical and in the same order as in Y
# If the script prints "species names in TP and SXY match", you are ok.
# If it says that they do not match, you need to modify the files so that they match
if(all(rownames(TP)==colnames(Y))) {
print("species names in TP and SXY match")
} else{
print("species names in TP and SXY do not match")
}
# Modify the next two lines to split your TP file to components that relate to
# Tr: species traits (note that T is a reserved word in R and that's why we use Tr)
# P: phylogenetic information given by taxonomical levels, e.g. order, family, genus, species
# If you don't have trait data, indicate this by Tr=NULL.
# If TP does not have phylogenetic data (because you don't have such data at all, or because
# it is given in tree-format, like is the case in this example), indicate this with P=NULL
Tr = TP[,1:1]
P = NULL
# Check that the data looks as it should!
View(Tr)
# Check that the Tr data do not have missing values (they are allowed for Y but not S, X, P or Tr)
if (any(is.na(Tr))) {
print("Tr has NA values - not allowed for")
} else {
print("Tr looks ok") }
# Check that the phylogenetic/taxonomic data do not have missing values (they are allowed for Y but not S, X, P or Tr)
if (any(is.na(P))) {
print("P has NA values - not allowed for")
} else {
print("P looks ok") }
}
Tr = data.frame(TP)
names(Tr)= c("Trans_Mode","Path")
Tr
# Save the data!
Y = as.matrix(Y)
save(S,X,Y,Tr,file="allDataNEG.R")
We are ready to construct the models!
Constructing the models is done via the following steps:
load(file=file.path("allDataPOS.R"))
# Make sure that exploration and stress sensitivity are numeric
X[,3:4] = apply(X[,3:4], 2, as.numeric)
# Create the fixed effects part of the model
XFormula = ~Sex + Age2+ Exploration + Stress_Sens
# We will look at the transmission mode
TrFormula = ~Trans_Mode
# Create the random effects part of the model
studyDesign = data.frame(individual = as.factor(S$Individual), site = as.factor(S$Trap_Site)) # Create the study design
St = studyDesign$site # Set site as a random effect
rL.site = HmscRandomLevel(units = levels(St))
Ind = studyDesign$individual # Set the individual as an additional fixed effect
rL.individual = HmscRandomLevel(units = levels(Ind))
Ypa = 1*(Y>0) # Transfer the data to presence/absence
# We are only interested in infection probability, rather than parasite load
# Presence-absence model
m1 = Hmsc(Y=Ypa, # y-variable (1/0)
XData = X, # X-variables
XFormula = XFormula, # Fixed effects part
TrData = Tr, # Trait data
TrFormula = TrFormula, # Trait effects part
distr="probit" , # Distribution
studyDesign=studyDesign, # Study design
ranLevels={list("site" = rL.site, # random effects
"individual" = rL.individual)})
# Place the model in a list
models = list(m1)
modelnames = c("presence_absence") # Give them the proper names
# Save the model in the model folder
save(models,modelnames,file = file.path(ModelDir, "unfitted_model_POS_EyeLens_LAST"))
We will also do the same for the positive model with reproductive age as a fixed effect. But I will spare you the code
Again, we’ll have two models, either with age based on the eye lens weight, and one with reproductive age
load(file=file.path("allDataNEG.R"))
# Make sure that exploration and stress sensitivity are numeric
X[,3:4] = apply(X[,3:4], 2, as.numeric)
# Create the fixed effects part of the model
XFormula = ~Sex + Age2+ Exploration + Stress_Sens
# We will look at the transmission mode
TrFormula = ~Trans_Mode
# Create the random effects part of the model
studyDesign = data.frame(individual = as.factor(S$Individual), site = as.factor(S$Trap_Site)) # Create the study design
St = studyDesign$site # Set site as a random effect
rL.site = HmscRandomLevel(units = levels(St))
Ind = studyDesign$individual # Set the individual as an additional fixed effect
rL.individual = HmscRandomLevel(units = levels(Ind))
Ypa = 1*(Y>0) # Transfer the data to presence/absence
# We are only interested in infection probability, rather than parasite load
# Presence-absence model
m1 = Hmsc(Y=Ypa, # y-variable (1/0)
XData = X, # X-variables
XFormula = XFormula, # Fixed effects part
TrData = Tr, # Trait data
TrFormula = TrFormula, # Trait effects part
distr="probit" , # Distribution
studyDesign=studyDesign, # Study design
ranLevels={list("site" = rL.site, # random effects
"individual" = rL.individual)})
# Place the model in a list
models = list(m1)
modelnames = c("presence_absence") # Give them the proper names
# Save the model in the model folder
save(models,modelnames,file = file.path(ModelDir, "unfitted_model_NEG_EyeLens_LAST"))
Now we do the same, but using reproductive age instead of age based on eye lens weight (code is again hidden to make this more readable)
Now that we have constructed the two models (positive and negative), we are able to run them.
In order to determine how when the model converges, we need to run
the models multiple time and see how big the thinning needs to be.
Luckely, we can do loop this.
The following code runs the model. More precisely, it will run several different models with different thinnings. THis is needed to determine the minimal thinning needed for the model to converge.
But first, we need to run the model where we use age based on eye lens weight
# First, we need to load the model
load(file = "models/unfitted_model_POS_EyeLens_LAST")
# Setup the model runnings parameters
samples_list = c(5,250,250,250,250) # We will take only 250 samples first, to make it fast
thin_list = c(1,1,10,100,500) # We will increase the thinning with subsequent models
nChains = 5 # number of chains
# Run the models
for(Lst in 1:length(samples_list)){ # Creates the loop to run the different models
thin = thin_list[Lst] # takes the Thinning number out the list
samples = samples_list[Lst] # Takes the Number of samples number out the list
print(paste0("thin = ",as.character(thin),"; samples = ",as.character(samples))) # This will print which model is running
nm = length(models) # this is not really important here, because we are running one model (compared to a hurdle model)
for (model in 1:nm) {
print(paste0("model = ",modelnames[model])) # ads the model name
m = models[[model]] # this takes the model
# this is the actual running part
m = sampleMcmc(m, # Model info
samples = samples, # Number of samples
thin=thin, # thinning interval
adaptNf=rep(ceiling(0.4*samples*thin),m$nr), # Not sure
transient = ceiling(0.5*samples*thin), # burning? or the one above
nParallel = 5, # paralleling the chain over the different cores
nChains = nChains) # Number of chains
models[[model]] = m # voila!
}
# Now, it will save all the different models in the model folder
filename = paste("models/POSITIVE_EYELENS_MARKDOWN_models_thin_", as.character(thin),
"_samples_", as.character(samples),
"_chains_",as.character(nChains),
".Rdata",sep = "")
save(models,modelnames,file=filename)
}
Now we will run the model with reproductive age as fixed effect (code again hidden)
This code will generate a PDF which we can use to determine of the model converged or not.
This is the model with age based on eye lens weight
# Use the same numbers as before
samples_list = c(5,250,250,250,250) # We will take only 250 samples first, to make it fast
thin_list = c(1,1,10,100,500) # We will increase the thinning with subsequent models
nChains = 5
nst = length(thin_list)
ma = NULL
na = NULL
# Run the loop
for (Lst in 1:nst) {
thin = thin_list[Lst]
samples = samples_list[Lst]
filename = paste("models/POSITIVE_EYELENS_MARKDOWN_models_thin_", as.character(thin),
"_samples_", as.character(samples),
"_chains_",as.character(nChains),".Rdata",sep = "")
load(filename)
nm = length(models)
for(j in 1:nm){
mpost = convertToCodaObject(models[[j]], spNamesNumbers = c(T,F), covNamesNumbers = c(T,F))
psrf.beta = gelman.diag(mpost$Beta,multivariate=FALSE)$psrf
tmp = summary(psrf.beta)
if(is.null(ma)){
ma=psrf.beta[,1]
na = paste0(as.character(thin),",",as.character(samples))
} else {
ma = cbind(ma,psrf.beta[,1])
if(j==1){
na = c(na,paste0(as.character(thin),",",as.character(samples)))
} else {
na = c(na,"")
}
}
}
}
dev.new()
pdf(file=paste("models/MCMC_convergence_POS_EYELENS.pdf"))
par(mfrow=c(2,1))
vioplot(ma,col=rainbow_hcl(nm),names=na,ylim=c(0,max(ma)),main="psrf(beta)")
vioplot(ma,col=rainbow_hcl(nm),names=na,ylim=c(0.9,1.1),main="psrf(beta)")
dev.off()
# EXTRA FOR LATER
# vioplot(ma)
# max(ma)
# mean(ma)
# there is now a PDF file stored in the models map
We will run the same code for the model with reproductive age
Let’s take a look at the results generated from the code above.
The values need to be around 1. The two figures on the left are based on the model with age in days as fixed effect. The figures on the right are derived from the model with reproductive age as fixed effect.
It is clear this the models with a thining of 1-10 did not converge.
THe models with 100-500 are a lot better.
From this, I think that it’s safe to say that a thinning of 1.000 should
be sufficient for the full model.
We are now ready to run the full models.
Here is the code to run the full model with age based on eye lens weight
as a a fixed effect
# First, we need to load the model
load(file = "models/unfitted_model_POS_EyeLens_LAST")
# Setup the model runnings parameters
samples_list = c(2000) # We will take 2000 posterior samples
thin_list = c(1000) # thinning is 1000
nChains = 5 # number of chains
# Run the models
for(Lst in 1:length(samples_list)){ # Creates the loop to run the different models
thin = thin_list[Lst] # takes the Thinning number out the list
samples = samples_list[Lst] # Takes the Number of samples number out the list
print(paste0("thin = ",as.character(thin),"; samples = ",as.character(samples))) # This will print which model is running
nm = length(models) # this is not really important here, because we are running one model (compared to a hurdle model)
for (model in 1:nm) {
print(paste0("model = ",modelnames[model])) # ads the model name
m = models[[model]] # this takes the model
# this is the actual running part
m = sampleMcmc(m, # Model info
samples = samples, # Number of samples
thin=thin, # thinning interval
adaptNf=rep(ceiling(0.4*samples*thin),m$nr), # Not sure
transient = ceiling(0.5*samples*thin), # burning? or the one above
nParallel = 5, # paralleling the chain over the different cores
nChains = nChains) # Number of chains
models[[model]] = m # voila!
}
# Now, it will save all the different models in the model folder
filename = paste("models/POSITIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
"_samples_", as.character(samples),
"_chains_",as.character(nChains),
".Rdata",sep = "")
save(models,modelnames,file=filename)
}
Now we will do the same for the model with reproductive age
Let’s quickly check if the models converged.
## [1] 1.001895
## [1] 1.000292
## [1] 1.008809
## [1] 1.000745
Yes, both models have converged (values should be around 1, which is the case here).
We need to do the same for the negative model as well.
So first the model with age based on eye lens weight as a fixed.
# First, we need to load the model
load(file = "models/unfitted_model_NEG_EyeLens_LAST")
# Setup the model runnings parameters
samples_list = c(5,250,250,250,250) # We will take only 250 samples first, to make it fast
thin_list = c(1,1,10,100,500) # We will increase the thinning with subsequent models
nChains = 5 # number of chains
# Run the models
for(Lst in 1:length(samples_list)){ # Creates the loop to run the different models
thin = thin_list[Lst] # takes the Thinning number out the list
samples = samples_list[Lst] # Takes the Number of samples number out the list
print(paste0("thin = ",as.character(thin),"; samples = ",as.character(samples))) # This will print which model is running
nm = length(models) # this is not really important here, because we are running one model (compared to a hurdle model)
for (model in 1:nm) {
print(paste0("model = ",modelnames[model])) # ads the model name
m = models[[model]] # this takes the model
# this is the actual running part
m = sampleMcmc(m, # Model info
samples = samples, # Number of samples
thin=thin, # thinning interval
adaptNf=rep(ceiling(0.4*samples*thin),m$nr), # Not sure
transient = ceiling(0.5*samples*thin), # burning? or the one above
nParallel = 5, # paralleling the chain over the different cores
nChains = nChains) # Number of chains
models[[model]] = m # voila!
}
# Now, it will save all the different models in the model folder
filename = paste("models/NEGATIVE_EYELENS_MARKDOWN_models_thin_", as.character(thin),
"_samples_", as.character(samples),
"_chains_",as.character(nChains),
".Rdata",sep = "")
save(models,modelnames,file=filename)
}
Now we’ll do the same for the model with reproductive age as fixed effect. Again the code is hidden because it is very similar.
This code will generate a PDF which we can use to determine of the model converged or not. THe codes are the same as above, so we won’t include it (just to save some place)
The values need to be around 1. The two figures on the left are based on the model with age in days as fixed effect. The figures on the right are derived from the model with reproductive age as fixed effect.
It is clear this the models with a thining of 1-10 did not converge.
THe models with 100-500 are a lot better.
From this, I think that it’s safe to say that a thinning of 1.000 should
be sufficient for the full model.
Now we are ready to run the full model.
Again, this model includes the three ages classes, based on eye lens weight, as fixed effect
# First, we need to load the model
load(file = "models/unfitted_model_NEG_EyeLens_LAST")
# Setup the model runnings parameters
samples_list = c(2000) # We will take 2000 posterior samples
thin_list = c(1000) # thinning is 1000
nChains = 5 # number of chains
# Run the models
for(Lst in 1:length(samples_list)){ # Creates the loop to run the different models
thin = thin_list[Lst] # takes the Thinning number out the list
samples = samples_list[Lst] # Takes the Number of samples number out the list
print(paste0("thin = ",as.character(thin),"; samples = ",as.character(samples))) # This will print which model is running
nm = length(models) # this is not really important here, because we are running one model (compared to a hurdle model)
for (model in 1:nm) {
print(paste0("model = ",modelnames[model])) # ads the model name
m = models[[model]] # this takes the model
# this is the actual running part
m = sampleMcmc(m, # Model info
samples = samples, # Number of samples
thin=thin, # thinning interval
adaptNf=rep(ceiling(0.4*samples*thin),m$nr), # Not sure
transient = ceiling(0.5*samples*thin), # burning? or the one above
nParallel = 5, # paralleling the chain over the different cores
nChains = nChains) # Number of chains
models[[model]] = m # voila!
}
# Now, it will save all the different models in the model folder
filename = paste("models/NEGATIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
"_samples_", as.character(samples),
"_chains_",as.character(nChains),
".Rdata",sep = "")
save(models,modelnames,file=filename)
}
We will do the same, as usual, with the model with reproductive age as fixed effect
Let’s quickly check if the models converged.
## [1] 1.000503
## [1] 1.005297
## [1] 1.002612
## [1] 1.020796
Also here, both models converged nicely.
In this part, we’ll take a closer look at the results from the final models of the positive and negative model. We will layout the results next to each other and compare then.
Below is the code that is used to generate a pdf with the parameter estimates. The code here is written for the positive model with age in days (based on eye lense weight) as fixed effect. We have ran the same code for the other models, but this is not shown, since it’s the exact same code.
# POSITIVE MODEL EYE LENS
# Use the same numbers as before
samples = 2000 # Number of samples for the final model
thin = 1000 # Thinning of the final model
nChains = 5
nst = length(thin_list)
# load the final model
filename = paste("models/POSITIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
"_samples_", as.character(samples),
"_chains_",as.character(nChains),
".Rdata",sep = "")
# Load models
load(filename)
nm = length(models)
filename = paste("panels/parameter_FINAL_estimates_MARK_POS_EYELENS.pdf")
pdf(file = filename)
for(j in 1:nm){
#j=1
m = models[[j]]
VP = computeVariancePartitioning(m)
vals = VP$vals
mycols = rainbow(nrow(VP$vals))
plotVariancePartitioning(hM=m, VP=VP,cols = mycols, args.leg=list(bg="white",cex=0.7),
main = paste0("Proportion of explained variance, ",modelnames[[j]]),cex.main=0.8)
preds = computePredictedValues(m)
MF = evaluateModelFit(hM=m, predY=preds)
R2 = NULL
if(!is.null(MF$TjurR2)){
TjurR2 = MF$TjurR2
vals = rbind(vals,TjurR2)
R2=TjurR2
}
if(!is.null(MF$R2)){
R2=MF$R2
vals = rbind(vals,R2)
}
filename = paste0("panels/parameter_estimates_VP_",modelnames[[j]],".csv")
write.csv(vals,file=filename)
if(!is.null(R2)){
VPr = VP
for(k in 1:m$ns){
VPr$vals[,k] = (1-R2[k])*VPr$vals[,k]
}
VPr$vals = VPr$vals[,order(R2)]
plotVariancePartitioning(hM=m, VP=VPr,cols = mycols, args.leg=list(bg="white",cex=0.7),ylim=c(0,1),
main=paste0("Proportion of raw variance, ",modelnames[[j]]),cex.main=0.8)
filename = paste0("panels/parameter_estimates_VP_Test_",modelnames[[j]],".csv")
write.csv( VPr$vals,file=filename)
}
}
for(j in 1:nm){
#j = 2
m = models[[j]]
postBeta = getPostEstimate(m, parName="Beta")
show.sp.names = (is.null(m$phyloTree) && m$ns<=20)
plotBeta(m,
post=postBeta,
supportLevel = 0.95,
param="Sign",
plotTree = !is.null(m$phyloTree),
covNamesNumbers = c(TRUE,FALSE),
spNamesNumbers=c(show.sp.names,FALSE),
cex=c(0.6,0.6,0.8))
mymain = paste0("BetaPlot, ",modelnames[[j]])
if(!is.null(m$phyloTree)){
mpost = convertToCodaObject(m)
rhovals = unlist(poolMcmcChains(mpost$Rho))
mymain = paste0(mymain,", E[rho] = ",round(mean(rhovals),2),", Pr[rho>0] = ",round(mean(rhovals>0),2))
}
title(main=mymain, line=2.5, cex.main=0.8)
me = as.data.frame(t(postBeta$mean))
me = cbind(m$spNames,me)
colnames(me) = c("Species",m$covNames)
po = as.data.frame(t(postBeta$support))
po = cbind(m$spNames,po)
colnames(po) = c("Species",m$covNames)
ne = as.data.frame(t(postBeta$supportNeg))
ne = cbind(m$spNames,ne)
colnames(ne) = c("Species",m$covNames)
vals = list("Posterior mean"=me,"Pr(x>0)"=po,"Pr(x<0)"=ne)
filename = paste0("panels/parameter_estimates_Beta_",modelnames[j],".xlsx")
writexl::write_xlsx(vals,path = filename)
}
for(j in 1:nm){
if(m$nt>1){
m = models[[j]]
postGamma = getPostEstimate(m, parName="Gamma")
plotGamma(m, post=postGamma, supportLevel = 0.9, param="Sign",
covNamesNumbers = c(TRUE,FALSE),
trNamesNumbers=c(m$nt<21,FALSE),
cex=c(0.6,0.6,0.8))
title(main=paste0("GammaPlot ",modelnames[[j]]), line=2.5,cex.main=0.8)
}
}
for(j in 1:nm){
#j=1
m = models[[j]]
OmegaCor = computeAssociations(m)
supportLevel = 0.95
for (r in 1:m$nr){
#r = 1
plotOrder = corrMatOrder(OmegaCor[[r]]$mean,order="AOE")
plotOrder = 1:m$ns
toPlot = ((OmegaCor[[r]]$support>supportLevel) + (OmegaCor[[r]]$support<(1-supportLevel))>0)*sign(OmegaCor[[r]]$mean)
#colnames(toPlot)=rep("",m$ns)
#rownames(toPlot)=rep("",m$ns)
mymain = paste0("Associations, ",modelnames[[j]], ": ",names(m$ranLevels)[[r]])
if(m$ranLevels[[r]]$sDim>0){
mpost = convertToCodaObject(m)
alphavals = unlist(poolMcmcChains(mpost$Alpha[[1]][,1]))
mymain = paste0(mymain,", E[alpha1] = ",round(mean(alphavals),2),", Pr[alpha1>0] = ",round(mean(alphavals>0),2))
}
corrplot(toPlot[plotOrder,plotOrder], method = "color",
col=colorRampPalette(c("blue","white","red"))(3),
mar=c(0,0,1,0),
main=mymain,cex.main=0.8)
me = as.data.frame(OmegaCor[[r]]$mean)
me = cbind(m$spNames,me)
colnames(me)[1] = ""
po = as.data.frame(OmegaCor[[r]]$support)
po = cbind(m$spNames,po)
colnames(po)[1] = ""
ne = as.data.frame(1-OmegaCor[[r]]$support)
ne = cbind(m$spNames,ne)
colnames(ne)[1] = ""
vals = list("Posterior mean"=me,"Pr(x>0)"=po,"Pr(x<0)"=ne)
filename = paste0("panels/parameter_estimates_Omega_",modelnames[[j]],"_",names(m$ranLevels)[[r]],".xlsx")
writexl::write_xlsx(vals,path = filename)
}
}
dev.off()
We end up with the following results from both models.
We see that the age classes explain a lot of variation in some species, but not in others.
Here, we see the results from the 2 models with different covariates (A: model with age classes defined by eye lens weight, B = model with age defined by reproductive state). The blue points and error bars are the positve model, the red one is the negative model.
Basically, it looks like the results are quite similar.
| Fixed_effects | Anaplasma | Bartonella | Davaineidae | Hymenolepis | Protospirura | Syphacia | Trichostrongylidae | Trichuris |
|---|---|---|---|---|---|---|---|---|
| Intercept | -1.41 (-2.16;-0.74) | -2.01 (-2.62;-1.48) | -1.66 (-3.77;-0.63) | -0.51 (-0.99;-0.12) | -0.92 (-1.28;-0.58) | -1.17 (-1.53;-0.82) | -1.52 (-1.92;-1.15) | -1.66 (-2.13;-1.26) |
| Sex (male) | -0.17 (-0.67;0.31) | -0.59 (-1.22;-0.02) | -1.12 (-2.83;-0.27) | -0.26 (-0.76;0.15) | -0.81 (-1.26;-0.4) | 0.26 (-0.12;0.63) | -0.04 (-0.46;0.38) | -0.03 (-0.46;0.4) |
| Exploration | 0.04 (-0.2;0.28) | -0.47 (-0.8;-0.18) | -0.54 (-1.63;-0.06) | 0.13 (-0.08;0.37) | -0.17 (-0.38;0.04) | -0.07 (-0.27;0.12) | -0.01 (-0.23;0.21) | -0.12 (-0.36;0.11) |
| Stress-sensitivity | -0.03 (-0.27;0.21) | -0.3 (-0.57;-0.03) | -0.45 (-1.38;-0.02) | 0.15 (-0.06;0.41) | -0.1 (-0.3;0.09) | -0.03 (-0.22;0.16) | -0.02 (-0.24;0.2) | -0.19 (-0.43;0.04) |
| Age | ||||||||
| (50 < X < 200 days) | 0.87 (0.38;1.36) | 0.88 (0.3;1.52) | 0.86 (0.17;1.89) | 0.88 (0.43;1.47) | 1.08 (0.67;1.51) | 0.21 (-0.19;0.6) | 0.34 (-0.09;0.78) | 0.58 (0.14;1.07) |
| Age | ||||||||
| (> 200 days) | 0.5 (-0.3;1.25) | 1.36 (0.64;2.11) | 1.53 (0.49;3.14) | 1.34 (0.65;2.2) | 2.06 (1.41;2.8) | 0.7 (0.12;1.26) | 1.31 (0.75;1.88) | 1.42 (0.82;2.07) |
| Fixed_effects | Anaplasma | Bartonella | Davaineidae | Hymenolepis | Protospirura | Syphacia | Trichostrongylidae | Trichuris |
|---|---|---|---|---|---|---|---|---|
| Intercept | -1.75 (-2.47;-1.11) | -2.29 (-2.93;-1.7) | -1.79 (-3.88;-0.71) | -0.52 (-0.99;-0.13) | -0.91 (-1.27;-0.58) | -1.18 (-1.54;-0.84) | -1.53 (-1.93;-1.17) | -1.64 (-2.06;-1.26) |
| Sex (male) | 0.11 (-0.38;0.6) | -0.27 (-0.91;0.31) | -1.19 (-2.82;-0.3) | -0.28 (-0.79;0.14) | -0.8 (-1.23;-0.4) | 0.27 (-0.09;0.64) | -0.03 (-0.46;0.38) | -0.01 (-0.45;0.42) |
| Exploration | 0.06 (-0.2;0.31) | -0.47 (-0.84;-0.14) | -0.6 (-1.74;-0.07) | 0.14 (-0.09;0.38) | -0.17 (-0.37;0.03) | -0.07 (-0.27;0.13) | -0.01 (-0.23;0.21) | -0.12 (-0.36;0.11) |
| Stress-sensitivity | -0.04 (-0.29;0.2) | -0.4 (-0.71;-0.11) | -0.5 (-1.48;-0.02) | 0.16 (-0.06;0.43) | -0.1 (-0.3;0.1) | -0.03 (-0.22;0.16) | -0.03 (-0.25;0.19) | -0.19 (-0.43;0.04) |
| Age | ||||||||
| (50 < X < 200 days) | 1.06 (0.54;1.58) | 0.64 (0.02;1.3) | 0.89 (0.14;1.91) | 0.93 (0.46;1.52) | 1.08 (0.68;1.5) | 0.22 (-0.18;0.61) | 0.34 (-0.09;0.77) | 0.57 (0.12;1.04) |
| Age | ||||||||
| (> 200 days) | 0.59 (-0.23;1.39) | 1.31 (0.56;2.09) | 1.66 (0.58;3.21) | 1.38 (0.69;2.21) | 2.07 (1.44;2.74) | 0.72 (0.15;1.26) | 1.31 (0.76;1.86) | 1.4 (0.83;2.01) |
| Fixed_effects | Anaplasma | Bartonella | Davaineidae | Hymenolepis | Protospirura | Syphacia | Trichostrongylidae | Trichuris |
|---|---|---|---|---|---|---|---|---|
| Intercept | -0.89 (-1.52;-0.34) | -1.33 (-1.84;-0.95) | -0.34 (-0.66;-0.03) | 0.16 (-0.14;0.44) | -0.01 (-0.33;0.3) | -0.92 (-1.22;-0.64) | -1.1 (-1.46;-0.79) | -1.58 (-2.89;-0.96) |
| Sex (male) | -0.17 (-0.64;0.3) | -0.54 (-1.18;0.02) | -0.66 (-1.09;-0.25) | -0.24 (-0.61;0.12) | -0.99 (-1.78;-0.46) | 0.22 (-0.15;0.6) | -0.08 (-0.52;0.36) | -0.14 (-0.86;0.44) |
| Exploration | -0.07 (-0.31;0.16) | -0.57 (-0.94;-0.25) | -0.3 (-0.52;-0.1) | 0.02 (-0.16;0.2) | -0.34 (-0.68;-0.09) | -0.11 (-0.31;0.08) | -0.09 (-0.32;0.12) | -0.32 (-0.78;0.01) |
| Stress-sensitivity | -0.04 (-0.26;0.18) | -0.3 (-0.59;-0.03) | -0.21 (-0.42;-0.01) | 0.1 (-0.08;0.28) | -0.13 (-0.41;0.11) | -0.05 (-0.24;0.14) | -0.05 (-0.27;0.18) | -0.31 (-0.79;0.02) |
| Age | ||||||||
| Juvenile | -0.75 (-1.38;-0.16) | -1.02 (-1.97;-0.22) | -0.96 (-1.5;-0.43) | -0.9 (-1.37;-0.45) | -1.67 (-2.83;-0.94) | -0.31 (-0.78;0.15) | -0.58 (-1.16;-0.04) | -0.91 (-1.99;-0.15) |
| Fixed_effects | Anaplasma | Bartonella | Davaineidae | Hymenolepis | Protospirura | Syphacia | Trichostrongylidae | Trichuris |
|---|---|---|---|---|---|---|---|---|
| Intercept | -1.11 (-1.68;-0.6) | -1.64 (-2.1;-1.24) | -0.35 (-0.7;-0.03) | 0.17 (-0.12;0.46) | 0 (-0.29;0.29) | -0.94 (-1.26;-0.65) | -1.14 (-1.54;-0.81) | -1.69 (-3.31;-0.96) |
| Sex (male) | 0.07 (-0.41;0.54) | -0.21 (-0.8;0.35) | -0.66 (-1.1;-0.24) | -0.25 (-0.63;0.11) | -0.92 (-1.57;-0.42) | 0.22 (-0.16;0.61) | -0.07 (-0.54;0.37) | -0.15 (-0.97;0.48) |
| Exploration | -0.06 (-0.3;0.18) | -0.53 (-0.89;-0.2) | -0.31 (-0.53;-0.1) | 0.02 (-0.17;0.2) | -0.32 (-0.6;-0.08) | -0.11 (-0.31;0.08) | -0.1 (-0.32;0.13) | -0.34 (-0.87;0.02) |
| Stress-sensitivity | -0.05 (-0.28;0.18) | -0.39 (-0.68;-0.1) | -0.22 (-0.44;-0.01) | 0.1 (-0.07;0.3) | -0.12 (-0.37;0.1) | -0.05 (-0.24;0.14) | -0.05 (-0.28;0.18) | -0.34 (-0.88;0.01) |
| Age | ||||||||
| Juvenile | -0.66 (-1.32;-0.06) | -0.78 (-1.68;-0.01) | -0.95 (-1.52;-0.43) | -0.9 (-1.4;-0.45) | -1.59 (-2.57;-0.91) | -0.31 (-0.79;0.15) | -0.58 (-1.18;-0.05) | -0.94 (-2.1;-0.14) |
Let’s look at the predictions.
We’ll first look at effect of age (based on the different criteria) on the infection probability of Anaplasma and Bartonella (the main focus points of this studt)
On the site level
On the within-individual level
The models’ explanatory and predictive power were analyzed using the species-specific AUC and Tjur’s R² (Tjur, 2009) values (Pearce and Ferrier, 2000). Explanatory power was computed by making predictions based on models fitted to all data. Predictive ability was calculated by performing 5-fold cross-validation. The sampling units were assigned randomly to five-folds, and predictions for each fold were based on models fitted to data on the remaining four-folds.
Here is the code for the positive model with eye lens weight to define the different age classes
# Load the data again (Adjust this)
samples = 2000
thin = 1000
nChains = 5 # number of chains
filename = paste("POSITIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
"_samples_", as.character(samples),
"_chains_",as.character(nChains),".Rdata",sep = "")
load(filename)
m = models[[1]]
# We next evaluate model fit in terms of explanatory power and prediction power
preds = computePredictedValues(m)
MF = evaluateModelFit(hM=m, predY=preds)
# We also evaluate model fit in terms of predictive power based on two-fold cross-validation (later 5 fold). Doing the latter takes a lot of time, as the model needs to be re-fitted twice. For this reason, we store the results from cross-validation.
# Set run.cross.validation = TRUE to perform the cross-validation and to save the results.
# Set run.cross.validation = FALSE to read in results from cross-validation that you have run previously.
library(statmod)
nm = length(models)
MF = list()
MFCV = list()
WAIC = list()
for(model in 1:nm){
print(paste0("model = ",as.character(model)))
m = models[[model]]
preds = computePredictedValues(m)
MF[[model]] = evaluateModelFit(hM=m, predY=preds)
partition = createPartition(m, nfolds = 5)
preds = computePredictedValues(m,partition=partition,nParallel = 5) #nParallel = nChains
MFCV[[model]] = evaluateModelFit(hM=m, predY=preds)
WAIC[[model]] = computeWAIC(m)
}
filename_out = paste("MODELFIT_POSITIVE_EYELENS_FINAL_MARKDOWN_models_thin_", as.character(thin),
"_samples_", as.character(samples),
"_chains_",as.character(nChains),
".Rdata",sep = "")
save(MF,MFCV,WAIC,modelnames,file = filename_out)
Now, we do it for all the other models as well, but I won’t include the code to save space.
After we done all of this (which takes a lot of time, around 6 days per model), we can make the plots.